Learning Unbiased Stochastic Edit Distance in the form of a Memoryless Finite-State Transducer∗

نویسندگان

  • Jose Oncina
  • Marc Sebban
چکیده

We aim at learning an unbiased stochastic edit distance in the form of a finite-state transducer from a corpus of (input,output) pairs of strings. Contrary to the other standard methods, which generally use the algorithm Expectation Maximization, our algorithm learns a transducer independently on the marginal probability distribution of the input strings. Such an unbiased way to proceed requires to optimize the parameters of a conditional transducer instead of a joint one. This transducer can be very useful in many domains of pattern recognition and machine learning, such as noise management, or DNA alignment. Several experiments are carried out with our algorithm showing that it is able to correctly assess theoretical target distributions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Learned Conditional Distributions as Edit Distance

In order to achieve pattern recognition tasks, we aim at learning an unbiased stochastic edit distance, in the form of a finite-state transducer, from a corpus of (input,output) pairs of strings. Contrary to the state of the art methods, we learn a transducer independently on the marginal probability distribution of the input strings. Such an unbiased way to proceed requires to optimize the par...

متن کامل

Stochastic Contextual Edit Distance and Probabilistic FSTs

String similarity is most often measured by weighted or unweighted edit distance d(x, y). Ristad and Yianilos (1998) defined stochastic edit distance—a probability distribution p(y | x) whose parameters can be trained from data. We generalize this so that the probability of choosing each edit operation can depend on contextual features. We show how to construct and train a probabilistic finite-...

متن کامل

Learning Conditional Transducers for Estimating the Distribution of String Edit Costs

We focus on the Edit Distance and propose an algorithm to learn the costs of the primitive edit operations. The underlying model is a probabilitic transducer computed by using grammatical inference techniques, that is neither deterministic nor stochastic in the standard terminology. Moreover, this transducer is conditional, thus independent from the distributions of the input strings. Real worl...

متن کامل

Learning stochastic finite-state transducer to predict individual patient outcomes

The high frequency data in intensive care unit is flashed on a screen for a few seconds and never used again. However, this data can be used by machine learning and data mining techniques to predict patient outcomes. Learning finite-state transducers (FSTs) have been widely used in problems where sequences need to be manipulated and insertions, deletions and substitutions need to be modeled. In...

متن کامل

Finite Growth Models

Finite growth models (FGM) are nonnegative functionals that arise from parametrically-weighted directed acyclic graphs and a tuple observation that aaects these weights. The weight of a source-sink path is the product of the weights along it. The functional's value is the sum of the weights of all such paths. The mathematical foundations of hidden Markov modeling (HMM) and expectation maximizat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005